Space-time correlations in urban population flows 



A. Hernando 1 , A. Plastino 2 ' 3 
1 Laboratoire Collisions, Agregats, Reactivite, IRSAMC, 
Universite Paul Sabatier 118 Route de Narbonne 31062 - Toulouse CEDEX 09, France 
2 National University La Plata, Physics Institute (IFLP-CCT-CONICET) C.C. 727, 1900 La Plata, Argentina 
3 Universitat de les Illes Balears and IFISC-CSIC, 07122 Palma de Mallorca, Spain 

Evidences are presented concerning tantalizing regularities in cities' population-flows in what 
regards to space and time correlations. The former exhibit a distance-behavior (for large distances) 
compatible with the inverse square law, following an overall Lorentzian dependence with an scale- 
parameter of 74 ±6 km. The later decay exponentially with a characteristic time of 17.2 ± 1.3 years. 
These features can be explained by a dynamical model for cities' population-growth of a Lagevinian 
nature. Numerical simulations based on the model confirm its applicability. The model also allows 
for the identification of collective normal modes of city-growth dynamics that can be empirically 
identified. 



The application of mathematical models to social sci- 
ences has a long and distinguished history [TJ |2] . A large 
number of studies show that the population-evolution 
in urban agglomerations exhibits patterns that can be 
modeled by mathematical laws (Refs. [3]-[5]and references 
therein). In particular, the interaction between cities 
(as measured by, for instance, the number of crossed 
phone calls[7] or human mobility |8J) displays predictable 
characteristics. Thus, it is plausible to conjecture that 
some kind of universality underlies collective human 
behavior[5, 9j. The observation and detection of reg- 
ular space-time patterns in urban-population evolution 
may be viewed as constituting an important step towards 
understanding collective, human dynamics. Indeed, the 
parametrization of such regularities could lead to a po- 
tential improvement of the present population-projection 
tools [IS]. 

Based on official Census-datapT] , we analyzed the time- 
evolution of the population of the n — 8116 Spanish Mu- 
nicipalities in a time-window of 13 years, from 1998 to 
2010, with a total population N of 47021031 in 2010. We 
write the total population at year t (setting t = 1 for the 
year 1998) as N(t) = Yh=i Xi(t), where Xi(t) is the pop- 
ulation of the i-th Municipality at that year. In order to 
guarantee that we take into account internal-flow effects 
we work with relative-populations Xi(t) = Xi(t)/N(t). 
The annual relative population change then reads 

x i (t) = x i (t+l)-x i (t), (1) 

thus obtaining T = 12 data sets for this variable. 
We specifically focus attention upon the pairs of data, 
{xi(t), ii(t)} so as to assess: 

I) The mean value of the population and the variance 
of its annual change in our time window. As shown 
in Ref. [S] some dependence of the later on the for- 
mer is expected. 

II) The spatial dependence of the Pearson product- 
moment correlation coefficient[Y2\ for the annual 
change of each pair of municipalities i, j. 



III) The time-dependence of that correlation coefficient 
for each pair of years of available data. 

IV) Finally, we advance a Langevin equation[13] for the 
evolution of the populations able to reproduce all 
these three characteristics. 

Following the above scheme one has in step I) the mean 
value of the population, the mean annual change, and the 
variance of each population i written as, respectively 
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In the wake of Ref. \S\ we plot in Fig. 1 the pairs 
{(xi),V[xi\/{xi)}. The results nicely fit an expression 
of the type 

V[xi]/(xi) = (T 2 (xi) +cr 2 /2 (3) 

i.e., proportional growth (cr-term) plus finite-size noise 
(o~i/2— noise). A fit of the data to that expression yields 
a = 0.0119 years -1 and a±/2 = 6.9 x 10 -5 years -1 . Pro- 
portional growth becomes dominant for large-population 
cities, while finite-size noise becomes dominant for low- 
populations. As shown in the above cited reference, for 
a network-based model proportional growth depends on 
the actual structure of the social network, while numeri- 
cal noise is a consequence of the Central Limit Theorem. 
It is then to be expected that the former will convey in- 
formation about the nature of the system, while the later 
would constitute just uncorrelated noise. 
Let us pass now to step II), i.e., to spatial corre- 
lations. The distance between cities is for the i, j-th 
ones. The correlation coefficient reads 

->= c °^=7mm- (4) 
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Figure 1: {{xi), V[xi]/{xi)} pairs for each Spanish municipal- 
ity. Red Triangles: median values. Solid black: fit to the 
median following Eq. j3i. Dashed black lines: Finite-size's 
fluctuations are constant, while the multiplicative regime is 
given by a straight line. 



where we use the covariance Cov[&i, Xj] — ([±i — (ii)][ij~ 
(ij))). Fig. 2 displays the empirical distribution p(r,c) 
of the pairs {fij,Cij} for Spanish cities of population 
> 20000 inhabitants (x > 4 X 10~ 4 , n = 396), where pro- 
portional growth clearly dominates over finite-size noise 
(see Fig. f). We base bottom panel of Fig. 2 on an 
histogram of the distance-correlation pairs, normalized 
along the c direction (using intervals of Aln(r) = O.f 
and Ac = 1/15) in a window such that 1.7 < ln(rjj) < 6. 
We have applied a "moving average" of 10 points in the 
ln(r) direction so as to get a smoother surface for guid- 
ing the eye. A clear dependence on the distance becomes 
evident. The ensuing distribution perfectly adjusts the 
expected correlation coefficient's distribution for a bivari- 
ate normal distribution given in Ref. 1141 Using a finite 
number of data-points T, we name this distribution as 
P(c, C, T) where c stands for the correlation-value that 
one might numerically obtains using Eq. Q, and C for 
the actual correlation value. We assume for the latter 
the analytical form 
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where C(0), r , and a are adjustable parameters. 
P(c, C, T) and p(r, c) become then related via p(r, c) = 
P[c,C(r),T] (see Fig. 2). We have found that the em- 
pirical median value of the correlation, measured in the 
same intervals A ln(r) for a window 5 < ny < 1000 km, 
nicely fits the above expression with C(0) = 0.254±0.009, 
r = 74 ± 6 km, a = 2.1 ± 0.3, and a goodness coefficient 
of R 2 = 0.996 (Fig. 2). Remarkably enough, for large 
distances one has C(r) ~ l/r 2 , in agreement with the 
Gravity Model[7]. For the smallest cities (x < 2 x 10~ 5 ) 
no evidence of spatial correlation is encountered because 
of numerical noise cx ^/x, confirming our expectations. 
For mid-populated cities we find a mixture between the 
two regimes. 




Figure 2: Top panel: Empirical mean correlation vs. distance 
(points), fitted to Eq. |5| (line). Bottom panels: empiri- 
cal p(r, c) and theoretical P[c, C(r),T] distributions, together 
with the pertinent mean value (black line). 



We pass now to step III), time-correlations. Con- 
sider i) the n-cities-average and variance such that 

-. n 1 n 

(m = ~x>,(t), nm = - (mr, 
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(6) 

and ii) the ensuing correlation coefficient. We estimate 
time-correlations via the average 



c(At) = 
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Cor[x(t),x(t + At)]. (7) 



Here At adopts the discrete values 1, 2, 3,. . ., T-l. Fig. 3 
depicts results for the major Spanish cities, as in the 
previous case. The ensuing temporal dependence can be 



parameterized using the expression c(At) 



ae 



"^ At . We 



find I/7 = 17.2 ± 1.3 years and a = 0.70 ± 0.02, with 
a goodness coefficient P? = 0.997. Again, correlations 
are not clearly discernible in the case of small-population 
cities, telling us that a finite-size noise, proportional to 
i/x, is indeed independent of time. The transition be- 
tween both regimes is depicted in Fig. 3. We display the 
empirical value of c(Ai = 1) as a function of the popu- 
lation in intervals of A ln(x) = 0.25. Growth from to 
~ 0.7 is clearly visible. 

In our final (IV) step, we advance a dynamical model 
able to reproduce the three empirical results we have en- 
countered above, namely, i) uncorrelated numerical noise, 
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Figure 3: Left: Empirical correlation vs. time (dots) fitted to 
an exponential function (straight line). The red lines repre- 
sent the standard deviation of the sum of Eq. |6]l . Note the 
log-scale on the vertical axis. Right: One-year correlation vs. 
relative population, where the transition from uncorrelated 
finite-size noise regime (zero correlation) to correlated pro- 
portional growth regime (C(l) ~ 0.7) is clearly appreciated. 
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Figure 4: Equilibrium density distribution of the log- 
population and the relative growth obtained in the simula- 
tion (dots), compared with the thermodynamical prediction 
Eq. (20| (lines). 



independent from r, t, for small cities together with pro- 
portional growth for large populations, ii) Lorentz-type 
spatial correlations, and iii) exponential time-correlation. 
We propose here the Langevin-like equation [13_ 



Xi(t) 
Wi(t) 
Vi(t) 

Fi(t) 



Xi(t)Vi(t) + yJxi{i)Wi{t) 

W t (t) 

Fi(t) - jiVi(t) 



(8) 
(9) 
(10) 

(11) 



where i) Wt(t) characterizes a Wiener-process 
({wi(t)wj(t + At)) = V[w]S l3 5(At), where V[w] is 
the corresponding variance), ii) 73 are dumping- 
parameters to be empirically determined, with regards 
to time-correlations, iii) the Rw are matrix-elements 
related to spatial correlations, and iv) fi>(t) are stochas- 
tic forces. We assume that the forces fluctuate and are 
independent of each other, being of the form 

Cov[/<(i), fj(t + At)} = VfSiMAt), (12) 

where Vf stands for the variance of /. For the total force 
F the covariance Cov^t), Fj(t + At)} = K is 



K = V f J2 R W R ji' S ( At ) = VfQijS(At). 



(13) 



The matrix Q = R ■ R T is normalized in such a fash- 
ion that its diagonal elements are all equal to unity, and 
thus cor[Fi(t), Fj(t)} = Qij. Since Wi(t) and fi(t) are not 
correlated by definition, Eq. ^ in automatically fulfilled 
from the variance of ii(t) with a 2 = V[vi] and <J 2 / 2 — 
V[w}. If the w— term is the dominant one (low popula- 
tion) there is no correlation among cities (nor in time) 
since we deal with a Wiener process (no memory, either) . 
For large populations the v— term dominates and we have 



Cij(At) = Cor[x i (t),x j (t + At)} = Cor[vi(t),Vj(t + At)} 
with a general solution to the Langevin equation for v 

Vi (t) = e-"*y (0) + / dre-^-^F^T). (14) 
Jo 

Since the initial time is arbitrary, we assume t — > 00 so 
as to obtain the u— correlation 



Ctf(At) = ca[[vi{t),Vj{t + M)] 



- 7j At 



7,: + 7i ' 



_ (15) 

Note that 2^/777^/(7^ + Jj) G [0, 1], and its mean value 
depends on the distribution of the 7— values. Our goal 
would now be nicely achieved if we could show that Qij = 
1/(1 + | j /ro 1 2 ) . To attain such result, let us start with a 
result of Ref. The number of phone-calls between two 
cities can be fitted to our Lorentzian shape. If we assume 
such a pattern for the information flow in a human social 
network, we may regard the forces F as resulting from 
the average of the stochastic forces / weighted by that 
Lorentzian function. Thus, F becomes a "coarse-grained" 
force. This is reflected by the definition 
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Let us consider for our derivation of Q the continuous 
limit Xi — > x(r), with r a planar spatial coordinate. x(r) 
represents the relative population at r, and the total nor- 
malized population becomes 1 = J s drx(r), where S is 
the pertinent region's area. Since we deal now with the 
coordinates r and r' instead of the indexes i,j, the R 
matrix-elements are a function R(\r — r'|) and the total 
coarse-grained force follows the convolution (®) 



F(r,t) = R(r) ® f(r,t) 



dt 



f 2[2/nr 2 }^f(r',t) 
l+4|(r~r')/r | 2 



(17) 
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Since the convolution of two Lorentzians of equal scale 
is also a Lorentzian with twice that scale-parameter, we 
find for the forces-correlation 



cor[F(r),F(r')] = R(\r - 
= Q(|r-r'|) = 



-r'\)®R(\r-r'\) 
1 

l + |(r-r')/r | 2 ' 



(18) 



i.e., the result we wished to reach. We can now in- 
terpret our equations in the following fashion: i) 

initially, some stochastic, independent forces /, act on 
the cities' populations, ii) the information-flow within 
the population can be characterized via a Lorentzian dis- 
tribution, so that the effective total force becomes the 
convolution of the /j with the distribution, iii) the ob- 
served correlations are thus Lorentzian, and at large dis- 
tances decay as the square of the distance. Finally, set- 
ting 7^ = 1/17 Vi and r — 74 km, our model can re- 
produce the empirical correlations. The result C(0) < 1 
can be attributed to both i) some numerical, uncorrelated 
noise and ii) to the empirical distribution of the dumping 
parameters 7, [as can be verified by looking at Eq. ( 1 5 1 ] . 
As an application we have performed a suitable sim- 
ulation. We randomly select n = 100 positions for 
population-centers on a square surface of side L = 
250 km (see Supplementary Figure). We took Vf = 
10~ 5 years -2 , 7 = 1/17 years" 1 , and r$ = 74 km, forcing 
the population to evolve within the range Xq < X < Xm, 
with X — 1 and Xm = 10 4 people. We do not include 
finite-size effects W for simplicity (^[w] = 0). The com- 
putational cost of solving our Langevin equation can be 
reduced via a normal- mode treatment: define a change- 
of-basis matrix A such that R (and Q) become diagonal. 
This generates new variables u'^t) = A u > logLYj(i)] 
whose motion-equations become 

u' i {t)=v' i {t); v'S) = ^if i {t)-lv' i {t), (19) 



with v'M = Y\, A u >Vi(t% /;(*) = Ei' AwfM and ^ 
is the «-th eigenvalue of R (with that of Q). One eas- 
ily checks that the forces /' are statistically equivalent to 
those indicated by / [i.e., (/£(i)/j(t+At)) = V s 8 l3 5(M% 
so that the simulation involves directly the random gen- 
eration of /', without having to actually effect the basis- 
change. The variables uj(t) evolve in independent fash- 
ion, representing normal- mode evolution. The pres- 
ence of ^fel accounts for different mode-equilibrations 
between /' and the dumping 7, which might be con- 
ceived as originating mass-factors. The Supplementary 
Figure displays the first four modes in such a way that 
the color at the Municipality i represents the coefficient 
Aai for the eigenvector i' (we also shown the same de- 
composition for Catalonia (Spain), using the empirical 
value of ro found above). Our simulations use Verlet- 
integration[Tj)] with an interval St = 1. Equilibrium 
states are detected, compatible with the thermodynam- 
ics developed in Ref. HO We verified that in equilibrium 



([v(t)} 2 ) = Vf/27 = 8.5 x 10~ 5 and that the normalized 
equilibrium distribution p(X, X)dXdX follows the tenets 
applicable to for a thermal system confined in a constant 
volume. As a matter of fact, defining the log-population 
u = \og{X) (X/X = u) one has[5] 



p(X, X)dXdX — p{u,u)dudu 



\ogX /X M 



(20) 



where j3 — 2j/Vf, as depicted in Fig. 4. 

Summing up, our model can be fine-tuned to any 
socio-geographical area via the dumping parameters 7^ 
and the inclusion, in the forces, of any empirically- 
known factor that may affect population growth. There 
is also room to include in the correlation matrix Q 
any other empirically-known distance-independent corre- 
lation. The model may improve on the predictive math- 
ematical tools available today |10J . Also, the study of 
the past evolution of the population in terms of normal 
models ould lead to a deeper understanding of collective 
human behavior at the macro-scale. 
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Supplementary Figure. Top: Components of the first four eigenvectos of the simulated system (from white to red: positive 

values; from white to blue: negative values). The surface of each municipality is defined by its Voronoi area 
I http://en.wikipedia.org/wiki/Voronoi_diagram). Bottom: Components of the first four eigenvectos for Catalonia, using the 
modelled spatial correlations (see text). The radii of the circles are proportional to the log-population. 



